[MLAS] Integrate KleidiAI BF16 SME2 Kernel Through Mlas SBGEMM Path#26773
[MLAS] Integrate KleidiAI BF16 SME2 Kernel Through Mlas SBGEMM Path#26773patryk-kaiser-ARM wants to merge 1 commit intomicrosoft:mainfrom
Conversation
|
@microsoft-github-policy-service agree company="Arm" |
There was a problem hiding this comment.
Pull request overview
This PR integrates the Arm® KleidiAI™ SME2 BF16 kernel into the MLAS SBGEMM (single-precision to bfloat16 GEMM) path. The integration provides performance improvements for bfloat16 matrix multiplication operations on ARM devices with SME2 support.
Changes:
- Added new
sbgemm_kleidiai.cppimplementation with KleidiAI BF16 SME2 kernel - Introduced
BIsPackedflag toMLAS_SBGEMM_DATA_PARAMSto track pre-packed matrix B state - Added override mechanism in SBGEMM path for KleidiAI kernels on SME2-enabled platforms
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| onnxruntime/core/mlas/lib/kleidiai/sbgemm_kleidiai.cpp | New implementation of SBGEMM using KleidiAI BF16 SME2 kernel |
| onnxruntime/core/mlas/lib/kleidiai/mlasi_kleidiai.h | Added function declarations for SBGEMM KleidiAI overrides |
| onnxruntime/core/mlas/lib/kai_ukernel_interface.h | Added SBGEMM ukernel interface declaration |
| onnxruntime/core/mlas/lib/kai_ukernel_interface.cpp | Added SBGEMM ukernel instantiation for SME2 |
| onnxruntime/core/mlas/lib/mlasi.h | Added typedef declarations for SBGEMM override functions |
| onnxruntime/core/mlas/lib/sbgemm.h | Added override mechanism to call KleidiAI SBGEMM functions |
| onnxruntime/core/mlas/lib/platform.cpp | Registered KleidiAI SBGEMM overrides for SME2-enabled platforms |
| onnxruntime/core/mlas/inc/mlas.h | Added BIsPacked field to MLAS_SBGEMM_DATA_PARAMS struct |
| onnxruntime/core/providers/cpu/math/matmul.cc | Set BIsPacked flag when using pre-packed matrix B |
| onnxruntime/test/mlas/unittest/test_sbgemm.h | Updated tests to initialize and set BIsPacked flag |
| cmake/onnxruntime_mlas.cmake | Added sbgemm_kleidiai.cpp to build system |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Hi @patryk-kaiser-ARM / @damdoo01-arm - Can you please resolve conflicts for this PR if it is still on the agenda ? We can target merging this PR next. Thanks. |
Signed-off-by: Patryk Kaiser <patryk.kaiser@arm.com>
51617ca to
509c420
Compare
|
Hi @hariharans29 I resolved conflicts. This one is still on the agenda - I am currently investigating adding support for fastmath to more operators so that this change can have a larger impact, however it would be a good idea to get this one in first and then open up consequent PRs to bring more ops down this path for fastmath. |
|
Can workflows be approved please |
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
Description
This PR integrates Arm® KleidiAI™ SME2 BF16 kernel through MLAS SBGEMM path.
Rework of #24346
Motivation and Context
This kernel provides performance improvements on SME-enabled devices.